Interactive Data Visualation

colume

Breast Cancer Analysis

Tumor

Patients in US

569

Malignan

White

190

African - America

154

Hispanic

111

Asian

94

Row

Plot1

Plot 2

Map

Data exploration

Row

Distribution of Cell Perimeter

Distribution of Cell Area

Row

Distribution of Cell Compactness

Distribution of Cell Radius Worst

Kmeans Model

Column

Confusion Matrix

Confusion Matrix and Statistics

           Reference
Prediction  Benign Malignant
  Benign       107        29
  Malignant      0        35
                                          
               Accuracy : 0.8304          
                 95% CI : (0.7656, 0.8834)
    No Information Rate : 0.6257          
    P-Value [Acc > NIR] : 3.994e-09       
                                          
                  Kappa : 0.6017          
 Mcnemar's Test P-Value : 1.999e-07       
                                          
            Sensitivity : 1.0000          
            Specificity : 0.5469          
         Pos Pred Value : 0.7868          
         Neg Pred Value : 1.0000          
             Prevalence : 0.6257          
         Detection Rate : 0.6257          
   Detection Prevalence : 0.7953          
      Balanced Accuracy : 0.7734          
                                          
       'Positive' Class : Benign          
                                          

SVM Model

Confusion Matrix

Confusion Matrix and Statistics

           Reference
Prediction  Benign Malignant
  Benign       106         3
  Malignant      1        61
                                          
               Accuracy : 0.9766          
                 95% CI : (0.9412, 0.9936)
    No Information Rate : 0.6257          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.9497          
 Mcnemar's Test P-Value : 0.6171          
                                          
            Sensitivity : 0.9907          
            Specificity : 0.9531          
         Pos Pred Value : 0.9725          
         Neg Pred Value : 0.9839          
             Prevalence : 0.6257          
         Detection Rate : 0.6199          
   Detection Prevalence : 0.6374          
      Balanced Accuracy : 0.9719          
                                          
       'Positive' Class : Benign          
                                          

Prediction: SVM model

John: Patient number 20th
        id diagnosis
20 8510426         B
Mary: Patient number 19th
       id diagnosis
19 849014         M

Model Testing Function (Use only 1 test data)

[1] "Patient ID:8510426 => Result: Benign"
[1] "Patient ID:849014 => Result: Malignant"

Make Submission Output (Use test dataset)

id predict_diagnosis origin_diagnosis correct
842517 M M True
84348301 M M True
844359 M M True
84501001 M M True
84862001 M M True
8510426 B B True
8510653 B B True
852552 M M True
852631 M M True
854002 M M True
854253 M M True
855138 M M True
855563 M M True
857010 M M True
857155 B B True
858986 M M True
859283 M M True
859487 B B True
859717 M M True
859983 B M False